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Abstract. Modeling non Gaussian and non stationary signals and im- 
ages has always been one of the most important part of signal and image 
processing methods. In this paper, first we propose a few new models, 
all based on using hidden variables for modeling either stationary but 
non Gaussian or Gaussian but non stationary or non Gaussian and non 
stationary signals and images. Then, we will see how to use these mod- 
els in independent component analysis (ICA) or blind source separation 
(BSS). The computational aspects of the Bayesian estimation framework 
associated with these prior models are also discussed. 

1 Introduction 

In many signal and image processing methods, and in particular in ICA or 
in BSS, the first step is prior modeling of them. Here, we consider only the 
probabilistic modeling where the samples of a signal {f(t),t = ,T} are 

represented by a random vector / for which we assign a probability law p(f). The 
main problem is then to choose an expression for p(f) to represent a particular 
family of signals or images. For example choosing a Gaussian expression for 
p{f) = A/"(/|0, Pq) with Pq an identity matrix will represent a stationary signal. 
In this case, we havep(/) = ^2 t p{f{t)) and the expression oip(f(t)) is Gaussian 
and docs not depend on t. The main objective of this paper is to consider the 
cases where p(f) is not Gaussian and/or is not separable and/or, if it is separable, 
P(f) = the expression of Pt{f{t)) depends on time t. In all these 

expressions, we can replace t by r representing the position index of a pixel for 
the case of images. 

One of the tools to model non Gaussianity is to use the mixture of probability 
laws, and in particular, the mixture of Gaussians: 

P(f(t)) =J2 k=1 a k Af(f(t)\(, k ,v k ) 
where 6 = {(a k , ^k,Vk),k = 1, • • • , K} are the parameters of the mixture and 
where ^fc=i a k = 1- When interpreting a k = P(z(t) — k) with z{t) a hidden 
variable, we can write p(f(t)\z(t) = k) = Af(f(t)\fj, k} v k ) which gives the possi- 
bility to consider z(t) as a classification label for the samples of the signal f(t). 
But also, this gives the possibility to introduce non stationary in modeling f(t) 
by letting z(t) change in a given way with time. 
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Another tool which also gives the possibility to introduce non gaussianity 
and non stationarity is to consider the parameters of the p{f) to be random or 
change in time. One such example is: 

p(f(t)\v(t),X) = JV(/(t)|0, 2v(t)/X) and p(v(t)\X) = XQ(v(t)\3/2, X) 
Here again, p(f(t)) is non Gaussian, and by letting v(t) change with time, we 
can also obtain a non stationary signal. 

In this paper, we are exploring a few cases of such models, and in particular 
the mixture of Gaussians model with a hidden markovian model, for different 
applications. We consider, in particular, the case of ICA or BSS where these 
kind of models are used for the sources or for the components. 

The rest of this paper is organized as follows: In section II, a set of Gaus- 
sian/non Gaussian and/or stationary/non stationary models and their proper- 
ties are presented. In Section III, we see how to use them as a prior law in a 
Bayesian framework, first in ICA and then in BSS. In Section IV, the Bayesian 
computational aspects related to the use of these models are discussed. 

2 Gaussian/Non Gaussian and stationary/Non stationary 
2.1 Gaussian and stationary models 

Let note the sample f{tj) = fj and by f = {fj,j = 1, • • ■ ,T} the whole samples 
and the Gaussian probability density function (pdf) p(f) = Af(f\f ,Po) with 



the mean f and the covariance matrix Pq. 
Three particular cases are then of interest: 



In a first step, we assume f = 0. 



Po ~- 
i.i.d. 



ajl. 



This is the case where fj are assumed centered, Gaussian and 
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cx exp 
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(1) 



P = rf-CCK This is the case where fj are assumed centered, Gaussian but 
correlated, the vector / is then considered to be obtained by: / = C£ with 



C corresponds to a moving average (MA) filtering and = Af(0, erf I)- In 
this case, we have: 



p(f) cx exp 



J 7 



cx exp 



(2) 



Po = cr'j(D t D)~ 1 with D l = (I — A). This is the case where fj are as- 
sumed centered, Gaussian and auto-regressive: / = Af + £ with A a matrix 
obtained from the AR coefficients and p(£) = Af(0,a"jl). In this case, we 
have 



p(f) oc exp 



(3) 



A particular case of AR model is the first order Markov chain 



P{fi\f-i)=M{fi-i,*t) with /o = 



(4) 



with corresponding A and D = I A matrices 
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which give the possibility to write 
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p(f) °c exp 



oc exp 



J 7 



(5) 



These particular cases give us the possibility to extend the prior model to other 
more sophisticated non-Gaussian models which can be classified in three groups: 



Separable: rr , , 

P\J) = \\Vi\fi) K ex P 



where 4>j are positive valued functions. If 
stationary. 
Markovian: 



(6) 



, Vj, then the model is 



P(f) = ~[[p(fj\fj-l) 00 exp 



'■^2<f>j(fj - fj-i) 



(7) 



where <pj are positive valued functions called potential functions of the 
Markovian model. Again here, if <pj = <f>, Vj, then we have a stationary 
(homogeneous) Markov model. 

Some examples of the <j> expressions used in many applications are: 
(j){t) = |t 2 ; \t\P, 1 < 13 < 2; - tin* + 1, ]fc > 0; min(i 2 ,l); 

3 Modeling using hidden variables 

As we mentioned in introduction, hidden variables give the possibility to model 
NG and/or NS signals. We present here a few interesting cases. 

Energy modulated signals: A simple model which can capture the energy 
or variance modulated signals is [T] . 



Pifj\vj, A) = ^(/,10,2^/A) and p( Vj \X) = S(«y|3/2, A) 



(8) 



where Q is the Gamma distribution. It is then easy to show the following rela- 
tions: 



and 



P(fj, v j\X) oc exp 



p(f,v\X) = Y[p(fj,Vj\X) oc exp 



4,| 



(9) 
(10) 



Amplitude modulated signals: To illustrate this with applications in telecom- 
munication signal and image processing, we consider the case of a Gaussian signal 
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modulated with a two level or binary signal. A simple model which can capture 
the variance modulated signal or images is 



p(f j \z j ,\)=Af(z j ,2/X) 



(11) 



with Zj £ {mi = 0, 77i2 = 1} and P(zj = rrik) = (1/2), k = 1, • • ■ , K = 2. 
It is then easy to show the following: 

K 

P(/ J |A)-E( 1 /2)AA(m fe ,^=2/A) 

k=l 



(12) 



and p(fj\zj, A) oc cxp [— A(/j — Zj) 2 ] and P(zj = A) oc cxp [— X(zj — fj) 2 ] ■ 

Mixture of Gaussians: The previous model can be generalized to the general 
mixture of Gaussians. We then have the following relations: 



P(fj\ z j = k,m kl v k ) =M(mk,Vk = 2/Afc) 
p{Zj = k) = 7Tfc Zj e {1, • • • , K} 

p(fj\ir k ,m k ,v k ) = Efc=i n k Ar{m k ,v k ) 



and 



p(f\z, m, A) oc exp 

p(z\f, m, A, 7r) oc exp 
-Pfe = k\f, m, A) oc exp 



- Ej Y. k [ X k5{Zj - fc)(/j - 777fc 
-AjfeCfj - "7fe) 2 + ln7T fc ] 



111 TTfc] 



(13) 



(14) 



and z|er e 2 , m, A) oc exp [—J(f, z)] with 

^(/, *) = Efe E{j: 2j =fe} Wj - m k? + Efc M^fc) Ej s ( z j - ™fc) 

= E fe A fe ||/ fe - 777 fc l|| 2 +E fc ^ln(^) 



(15) 



where m = {mi, ••• ,777^}, A = {Ai, • • • , \ K }, tt = {7Ti, • • • , tt k }, nfc = 
S(zj — k) is the number of samples fj which are in the class Zj = k and 
fk = {fj '■ z j = F° r more details and applications of such modeling see [213] . 
Mixture of Gauss-Markov model: In the previous model, we assumed that 
the samples in each class are independent. Here, we extend this to a markovian 

model. p{f j \z ] = k,Zj-i ^ k, fj-i, m k , v k ) =JV(m k ,v k ) 

P(fj\ z 3 =k,Zj-! = k, fj-i,m k ,v k ) = N(fj-i,v k ) (16) 



P( Zj = k) 



e {!,-■• ,K} 



which can be written in a more compact way if we introduce qj = l — 5(zj — Zj-i) 

by 

P(fj\Qj,fj-i> m k,Vk) =M{qjm k + (1 - qj)fj-i,v k ) (17) 
which results to: 



p(f\z, m, A)ocexp 
ocexp 



- Ej Efe ^k5{z 3 - k)[fj - {qjTrik + (1 - qj)fj-l) 

- E, Efe Afe^fe - *)[(! - qj)(fj - fj-xf + qj(fj - m k ) 2 ] 



(18) 



•J 



and p(f, z\(j e 2 ,m, A) oc exp [—J(f, z)\ with 

Af, z)= Ej E k hS(z, - k)[fj - ( qi m k + (1 - g,-)/,--i)] 2 + E k n k Info) 
= E,(i - qM h-if + E fe n k Info) 

= ||QAf|| 2 + £ fc n fc mfo) 

(19) 

where fj — \ z {fj — .D is the first order finite difference matrix and Q is 

a matrix with qj as its diagonal elements. 

In all these mixture models, we assumed Zj independent with P(zj = k) = 
7Tfc. However, Zj corresponds to the label of the sample fj. It is then better to 
put a markovian structure on it to capture the fact that, in general, when the 
neighboring samples of fj have all the same label, then it must be more probable 
that this sample has the same label. This feature can be modeled via the Potts- 
Markov modeling of the classification labels Zj . In the next section, we use this 
model, and at the same time, we extend all the previous models to 2D case for 
applications in image processing and to MIMO applications. 

4 Mixture and Hidden Markov Models for images 

In image processing applications, the notions of contours and regions are very 
important. In the following, we note by r ~ (x,y) the position of a pixel and 
by f{r) its gray level or by f(r) = {fi(r),--- ,/iv(T)} its color or spectral 
components. In classical RGB color representation N = 3, but in hyperspectral 
imaging N may be more than one hundred. When the observed data are also 
images we note them by g(r) = {ff»(r), • • • ,g M {r)}- 

In ICA problems we have g = Af and in more general BSS problems, we 
have g = Af + e, where A is the mixing matrix. In ICA methods, one often 
assume / = Bg where B is called separating matrix, which is ideally B = A^ 1 . 

For any image fj(r) we note by qj(r), a binary valued hidden variable, its 
contours and by Zj(r), a discrete value hidden variable representing its region 
labels. We focus here on images with homogeneous regions and use the mix- 
ture models of the previous section with an additional Markov model for the 
hidden variable Zj(r). Homogeneous regions modeling: In general, any im- 
age fj(r),r £ 1Z is composed of a finite set Kj of homogeneous regions Rj k 
with given labels Zj(r) = k, k = 1, • • • , Kj such that Rj, = {r : Zj(r) = k}, 
IZj = Ufci?j fc and the corresponding pixel values fj = {fj(r) : r e Rj k } 
and fj = Ukfj k - The Hidden Markov modeling (HMM) is a very general and 
efficient way to model appropriately such images. The main idea is to assume 
that all the pixel values fj, = {fj(r),r G Rj^) of a homogeneous region k 
follow a given probability law, for example a Gaussian M{rrij k \, £j k ) where 1 
is a generic vector of ones of the size rij k the number of pixels in region k. 

In the following, we consider two cases: 

- The pixels in a given region are assumed iid: 

p(fj(r)\zj(r) = k) = M{m ]k , a) k \ k = l,--,K j (20) 
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and thus 

p{f jk \zj(r) = k) = pif^r), r e R jk ) = AT(m ife l, o) k I) (21) 

This corresponds to the classical separable and mono-variate mixture models. 

- The pixels in a given region are assumed to be locally dependent: 

P(f jk \zj(r) = k)=p(f j (r),re R 0k ) = Af(m jk l, £ Jk ) (22) 

where £j k is an appropriate covariancc matrix. This corresponds to the 
classical separable but multivariate mixture models. 

In both cases, the pixels in different regions are assumed to be independent: 

K D Kj 

P(fj) = Rp(fj k ) = J[M{m jk l,Z jk ). (23) 
fe=i fe=i 

Modeling the labels: Noting that all the models (gO]), (JUl) and are 

conditioned on the value of Zj(r) = fc, they can be rewritten in the following 
general form 

Ptfifc) = E P ^ r ) = k ) ^Kk' E ik) ( 24 ) 
k 

where either £j k is a diagonal matrix ZJj k = °j k I 01 n °t- Now, we need also to 
model the vector variables Zj = {zj(r),r S 1Z}. Here also, we can consider two 
cases: 

- Independent Gaussian Mixture model (IGM), where {zj(r),r S 1Z} are as- 
sumed to be independent and 

P{zj(r) = k) =p k , with ^Pk = 1 and p(zj) = Y[pk (25) 

k k 

- Contextual Gaussian Mixture model (CGM), where Zj = {zj(r),r G 1Z} are 
assumed to be Markovian 

E 



p(zj) oc exp 



E 6 ( z i( r )- z j( s )) 



(26) 



which is the Potts Markov random field (PMRF). The parameter a controls 

the mean value of the regions' sizes. 
Hyperparameters prior law: The final point before obtaining an expression 
for the posterior probability law of all the unknowns, i.e, p(f,0\g) is to assign 
a prior probability law p(6_) to the hyperparameters 9. Even if this point has 
been one of the main discussing points between Bayesian and classical statistical 
research community, and still there are many open problems, we choose here to 
use the conjugate priors for simplicity. The conjugate priors have at least two 
advantages: 1) they can be considered as a particular family of a differential 
geometry based family of priors [4] and 2) they are easy to use because the prior 
and the posterior probability laws stay in the same family. In our case, we need 
to assign prior probability laws to the means nij k , to the variances cr? or to the 
covariance matrices 27j, and also to the covariance matrices of the noises e, of 
the likelihood functions. The conjugate priors for the means rrij k are in general 
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the Gaussians J\f(rrij kQ ,& 2 kQ ), those of variances a 2 k arc the inverse Gammas 
TQ(ao,/3o) and those for the covariance matrices Sj, are the inverse Wishart's 
lW(a ,A ). 

Expressions of likelihood, prior and posterior laws: We now have all 
the elements for writing the expressions of the posterior laws. We are going to 
summarizes them here: 

- Likelihood: p(g\f, 9) = JJ-U p(g\f, S ei ) = UZi ^(g - Af, S ei ) 
where we assumed that the noises e% are independent, centered and Gaus- 
sian with covariance matrices S ei which, hereafter, are also assumed to be 



diagonal S ei = a e jl 



- HMM for the images: p{f\z, 0) = U"=i P(fj\ *j , mj , Zj ) 

where we used z = {zj, j = 1, • • • , N} and where we assumed that fAzj 
are independent. 

- PMRF for the labels: p(z) oc exp a J^ren J2sev(r) <K z j( r ) — z j( s )) 
where we used the simplified notation p(zj) = P(Zj(r) — z(r),r £ TV) and 
where we assumed {zj,j = 1, • ■ • , ./V} are independent. 

- Conjugate priors for the hyperparameters: 
p{m 3k ) =Af(m jk0 , a 2 jkQ ), p{a 2 jk ) = ig{a j0 , jO ), 

P( S 3k) = ZW{aj ,A j0 ), p{<Jei) =ig(a i0 ,f3io). 

- Joint posterior law of /, z and 9 

p(f,z,Ojg) ex p(g\f, 9,) p(f\z, 2 ) p(z\9 2 ) p(9) 

4.1 Bayesian estimators and computational methods 

The expression of this joint posterior law is, in general, known up to a normali- 
sation factor. This means that, if we consider the Joint Maximum A Posteriori 
(JMAP) estimate 

(?,?,!) = arg max {p(f,z,6\g)} (27) 
(f_,z,9) 

we need a global optimization algorithm, but if we consider the Minimum Mean 
Square Estimator (MMSE) or equivalently the Posterior Mean (PM) estimates, 
then we need to compute this factor which needs huge dimensional integrations. 
There are however three main approaches to do Bayesian computation: 

Laplace approximation: When the posterior law is unimodalc, it is reason- 
able to approximate it with an equivalent Gaussian which allows then to do all 
computations analytically. Unfortunately very often. p{f,z,9\g) as a function 
of / only may be Gaussian, but as a function of z or 9_ is not. So, in general, 
this approximation method can not be used for all variables. 

Variational and mean field approximation: The main idea behind this 
approach is to approximate the joint posterior p(f,z,8\g) with another simpler 
distribution q{f,z,9\g) for which the computations can be done. A first step 
simpler distribution q{f,z,9\g) is a separable ones: 

q(f,z,9\g) = qi (f)q 2 (z)q 3 (9) (28) 
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In this way, at least reduces the integration computations to the product of three 
separate ones. This process can again be applied to any of these three distribu- 
tions, for example qi(f) = YijQijifj)- With the Gaussian mixture modeling 
we proposed, qi(f) can be chosen to be Gaussian, 52(2) to be separated to two 
parts 1ib{z) and qiw(z) where the pixels of the images are separated in two 
classes B and W as in a checker board. This is thanks the properties of the 
proposed Potts-Markov model with the four nearest neighborhood which gives 
the possibility to use qiB(z) and qiw(z.) separately. For (73(0) very often we also 
choose a separable distribution which use the conjugate properties of the prior 
distributions. 

Markov Chain Monte Carlo (MCMC) methods: These methods give 
the possibility to explore the joint posterior law and compute the necessary 
posterior mean estimates. In our case, we propose the general MCMC Gibbs 
sampling algorithm to estimate /, z and 6 by first separating the unknowns in 
two sets p(f,z\0_,g) and p(0\f,z_, g). Then, we separate again the first set in 
two subsets p{f\z,0,g) and p{z\0_, g). Finally, when possible, using the separa- 
bility along the channels, separate these two last terms in p(fj\zj,9j,gj) and 
p(zj\Oj 7 gj). The general scheme is then, using these expressions, to generates 

samples / , z^ n \ 9} n ' from the joint posterior law p(f,z,0\g) and after the 
convergence of the Gibbs samplers, to compute their mean and to use them as 
the posterior estimates. 

In this paper we are not going to detail these methods. However, we refer 
here to the application of these models in different area of signal and image 
processing and in particular in BSS [4|5j . 

5 Conclusion 

In this paper, first we proposed a few new models for modeling cither stationary 
but non Gaussian or Gaussian but non stationary or non Gaussian and non 
stationary signals and images. Then, we showed how to use these models in 
ICA or BSS. The computational aspects of the Bayesian estimation framework 
associated with these prior models are also discussed. 
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